Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Multi-modal dialog reply retrieval based on contrast learning and GIF tag
Yirui HUANG, Junwei LUO, Jingqiang CHEN
Journal of Computer Applications    2024, 44 (1): 32-38.   DOI: 10.11772/j.issn.1001-9081.2022081260
Abstract161)   HTML7)    PDF (1653KB)(150)       Save

GIFs (Graphics Interchange Formats) are frequently used as responses to posts on social media platforms, but many approaches do not make good use of the GIF tag information on social media when dealing with the question “how to choose an appropriate GIF to reply to a post”. A Multi-Modal Dialog reply retrieval based on Contrast learning and GIF Tag (CoTa-MMD) approach was proposed, by which the tag information was integrated into the retrieval process. Specifically, the tags were used as intermediate variables, the retrieval of text to GIF was then converted to the retrieval of text to GIF tag to GIF. Then the modal representation was learned by a contrastive learning algorithm and the retrieval probability was calculated using a full probability formula. Compared to direct text image retrieval, the introduction of transition tags reduced retrieval difficulties caused by the heterogeneity of different modalities. Experimental results show that the CoTa-MMD model improved the recall sum of the text image retrieval task by 0.33 percentage points and 4.21 percentage points compared to the DSCMR (Deep Supervised Cross-Modal Retrieval) model on PEPE-56 multimodal dialogue dataset and Taiwan multimodal dialogue dataset, respectively.

Table and Figures | Reference | Related Articles | Metrics